Clustering in the Boolean Hypercube in a List Decoding Regime
نویسندگان
چکیده
We consider the following clustering with outliers problem: Given a set of points X ⊂ {−1, 1}, such that there is some point z ∈ {−1, 1} for which Prx∈X [〈x, z〉 ≥ ε] ≥ δ, nd z. We call such a point z a (δ, ε)-center of X. In this work we give lower and upper bounds for the task of nding a (δ, ε)-center. Our main upper bound shows that for values of ε and δ that are larger than 1/poly log(n), there exists a polynomial time algorithm that nds a (δ− o(1), ε− o(1))-center. Moreover, it outputs a list of centers explaining all of the clusters in the input. Our main lower bound shows that given a set for which there exists a (δ, ε)-center, it is hard to nd even a (δ/n, ε)-center for some constant c and ε = 1/poly(n), δ = 1/poly(n). ∗Weizmann Institute of Science and Radcli e Institute for Advanced Study. Research supported in part by the Israel Science Foundation grant no. 1179/09 and by the Binational Science Foundation grant no. 2008293 and by an ERC grant no. 239985. †Weizmann Institute of Science.
منابع مشابه
Improved Skips for Faster Postings List Intersection
Information retrieval can be achieved through computerized processes by generating a list of relevant responses to a query. The document processor, matching function and query analyzer are the main components of an information retrieval system. Document retrieval system is fundamentally based on: Boolean, vector-space, probabilistic, and language models. In this paper, a new methodology for mat...
متن کاملImproved Skips for Faster Postings List Intersection
Information retrieval can be achieved through computerized processes by generating a list of relevant responses to a query. The document processor, matching function and query analyzer are the main components of an information retrieval system. Document retrieval system is fundamentally based on: Boolean, vector-space, probabilistic, and language models. In this paper, a new methodology for mat...
متن کاملBoolean autoencoders and hypercube clustering complexity
We introduce and study the properties of Boolean autoencoder circuits. In particular, we show that the Boolean autoencoder circuit problem is equivalent to a clustering problem on the hypercube. We show that clustering m binary vectors on the n-dimensional hypercube into k clusters is NP-hard, as soon as the number of clusters scales like m (ε > 0), and thus the general Boolean autoencoder prob...
متن کاملList-decodable zero-rate codes
We consider list-decoding in the zero-rate regime for two cases: the binary alphabet and the spherical codes in Euclidean space. Specifically, we study the maximal τ ∈ [0, 1] for which there exists an arrangement ofM balls of relative Hamming radius τ in the binary hypercube (of arbitrary dimension) with the property that no point of the latter is covered by L or more of them. As M →∞ the maxim...
متن کاملOn Optimal Erasure and List Decoding Schemes of Convolutional Codes
A modified Viterbi algorithm with erasures and list-decoding is introduced. This algorithm is shown to yield the optimal decoding rule of Forney with erasures and variable list-size. For the case of decoding with erasures, the optimal algorithm is compared to the simple algorithm of Yamamoto and Itoh. The comparison shows a remarkable similarity in simulated performance, but with a considerably...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Electronic Colloquium on Computational Complexity (ECCC)
دوره 20 شماره
صفحات -
تاریخ انتشار 2013